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Abstract. With the quick development of online social media such as 
twitter or sina weibo in china, many users usually track hot topics to 
satisfy their desired information need. For a hot topic, new opinions or 
ideas will be continuously produced in the form of online data stream. 

In this scenario, how to effectively filter and display information for a 
certain topic dynamically, will be a critical problem. We call the problem 
as Topic-focused Dynamic Information Filtering (denoted as TDIF for 
short) in social media. In this paper, we start open discussions on such 
application problems. We first analyze the properties of the TDIF prob¬ 
lem, which usually contains several typical requirements: relevance, di¬ 
versity, recency and confidence. Recency means that users want to follow 
the recent opinions or news. Additionally, the confidence of information 
must be taken into consideration. How to balance these factors properly 
in online data stream is very important and challenging. We propose a 
dynamic preservation strategy on the basis of an existing feature-based 
utility function, to solve the TDIF problem. Additionally, we propose 
new dynamic diversity measures, to get a more reasonable evaluation for 
such application problems. Extensive exploratory experiments have been 
conducted on TREC public twitter dataset, and the experimental results 
validate the effectiveness of our approach. 

Keywords: Data Stream, Utility Function, Dynamic Preservation Scheme, 
Evaluation 


1 Introduction 

The development of new social media such as twitter or sina weibo Q accelerates 
the spread of online information. In the social media, new information will be 
continuously produced in the form of online data stream, and how to retrieval 
useful information effectively will be very challenging. Specially, for a hot topic, 
how to filter and display relevant information dynamically will be a critical 
problem, which can be called as Topic-focused Dynamic Information Filtering 
in social media. 

1 http://weibo.com 
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The TDIF problem has three typical requirements: relevance , diversity and 
recency. The relevance requires the tweet information must be relevant to the 
topic. The diversity requires that corresponding tweet information can describe 
the topic from different aspects with little redundancy. Recency means that users 
want to follow the recent opinions or news quickly. Additionally, the human 
factor also affects the confidence of the tweet information. For example, the 
tweet information released by users with “V” authentication in sina weibo are 
usually with high confidence. Therefore, how to balance these critical factors 
becomes a new challenging problem. 

In fact, little prior research work has been done to tackle the TDIF problem. 
Most existing work only focuses one or two factors in information retrieval, 
such as pure relevance PHD], or pure diversity EDI, or relevance combing with 
diversity mm- Even in the industry field, such problem has also been not 
solved well. They usually only consider relevance, but can not capture diversity 
or recency, such as sina weibo in china. 

In this paper, We utilize the relational learning-to-rank model (R-LTR for 
short) pTH as utility function, and cconrbine with the dynamic preservation 
scheme based on time periodic windows, to solve the TDIF problem. R-LTR 
model is the state-of-the-art diverse ranking method, which models the diversity 
relations among documents in the ranking process, besides the content informa¬ 
tion of individual documents. It is a flexible feature-based ranking model with 
good adaptation to different application scenario. Although R-LTR model can 
tackle relevance and diversity well, it is limited in the static dataset. What’s 
more, R-LTR model is with time complexity of 0(n* k ), n means the number of 
all the candidate objects, and k indicates the number of desired results returned. 
Obviously, its efficiency can hardly satisfy the scenario of online data stream. 

Therefore, we propose the dynamic preservation scheme based on the R-LTR 
model for proper solution. Specifically, we segment the data stream into disjoint 
periods with time length T (segmentation granularity can be days or hours 
depending on detailed requirements). For each new time window, we preserve 
the top-(fc — to) most relevant results previously, then utilize the R-LTR ranking 
function to select new m relevant results, and finally display all the k results in 
chronological order. Here the parameter m can flexibly control the “staleness” 
of the returned results depending on the requirements of scenario. 

Additionally, due to the new properties of TDIF application problem, we also 
propose new dynamic diversity evaluation measures to get a more reasonable 
evaluation. In these new measures, we introduce the recency factor and confi¬ 
dence factor into existing popular diversity evaluation measures (i.e. ERR- /Arm 
a-NDCG^\ and NRBP\ TO].). Then we get a series of dynamic diversity evalu¬ 
ation measures: d-ERR , d-NDCG and d-NRBP. 

We conduct extensive evaluations on public TREC twitter dataset, and the 
experimental results show that our approach can achieve promising performance 
on both traditional diversity measures and new dynamic diversity measures. 
Meanwhile, our approach is also with high processing efficiency. 
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The rest of the paper is organized as follows. Section 2 introduces our pro¬ 
posed approach for TDIF problem. Section 3 introduces the new dynamic diver¬ 
sity evaluation measures. Section 4 presents the experimental results. Section 5 
describes related work and Section 6 concludes the paper. 

2 Our Approach 

As described before, the TDIF problem in social media has several typical re¬ 
quirements: relevance, diversity, recency and confidence. Therefore, the basic 
motivation of our approach is how to effectively capture and balance these typ¬ 
ical requirements. In this section, we will describe our strategy for dynamic 
information filtering, which mainly contains two parts. The first part is the cho¬ 
sen of basis utility function. The second part is the design of dynamic strategy 
that can take recency into consideration effectively. 


2.1 Utility Function 

The R-LTR model can effectively solve the diverse ranking problem in static 
dataset scenario, which models both relevance and diversity properly. As de¬ 
scribed in the literature |31j . the score of a candidate document contains two 
parts: relevance score based on content information of individual documents, 
and diversity score based on the relationship between the current document and 
those previously selected. We use A' denotes all the candidate documents, S de¬ 
notes previously selected documents, and X\S denotes the remanent documents. 
The score function can be formalized as follows. 


f s (xi,Ri) = + ujJh s (Ri)yxi € X\S (1) 

where x,; denotes the relevance feature vector of the candidate document a,y, Ri 
stands for the matrix of relationships between document Xi and other selected 
documents, with each Rij stands for the diversity feature vector between docu¬ 
ment Xi and Xj , represented by the feature vector of (Rij\ , ■ • • , Riji ), Xj € S, and 
Rijk stands for the fc-th diversity feature between documents Xi and Xj. hs(Ri) 
stands for the relational function on Ri , loJ and wj stands for the corresponding 
relevance and diversity weight vector. 

The relational function hs(Ri) denotes the way of representing the diversity 
relationship between the current document Xi and the previously selected doc¬ 
uments in S. It can be defined in three ways: Minimal, Average and Maximal. 
Here we choose the Minimal way, defined as follows. 

hs(Ri) = (nrin R tj i, • • • , min Riji). 

Xj€S Xj£S 

As described above, the R-LTR is a flexible feature-based ranking function, 
which has good adaptation to social media scenario and can be chosen as our ba¬ 
sis utility function. Comparing with other heuristic definitions of utility function 
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such as “Max-Sum” or “Max-Min” ITTlhfKTil . we can obtain a more reasonable 
basis utility function by supervised learning. When in real application, we need 
define and utilize specific relevance and diversity features close related to social 
media scenario. 


Relevance Feature Vector x,. For relevance feature vector, we first utilize 
traditional learning-to-rank relevance features, shown as follows. 

— Weighting Features. The typical weighting models include TF-IDF, BM25 
and language model. For language model, we use query-likelihood language 
model with Dirichlet prior. 

— Term Dependency Features. We also employ the classic term dependency 
features such as MRF [32], to enhance relevance. The MRF has two types of 
values: ordered phrase and unordered phrase, so the total feature number is 
2 . 

Additionally, we utilize some specific features in twitter, shown as follows. 

— Recency. We take the time factor into consideration, and prefer more recent 
tweet information. 

— UserRank. The importance of a certain user account, which can capture the 
confidence of information. It can be simply obtained via the followers of 
account. 

— Retweet Number. If a tweet is retweet many times, it is usually with high 
importance. 

This two types of features can be obtained via APfl provided by TREC. 


Diversity Feature Vector Rij For diversity features, we utilize typical se¬ 
mantic diversity features shown as follows. 

Cosine Diversity. The cosine diversity between two tweets is calculated 
based on their weighted term vector representations, and define the feature as 
follows. 


Rijl — 1 



II 


where s*, s j are the weighted term vectors of tweets based on tf * idf , and tf 
denotes the term frequencies, idf denotes inverse term frequencies. 

Jaccard Diversity. The Jaccard diversity between two tweets measures the 
ratio of overlapped terms, and is defined as follows. 


I Sj n s 0 


\Si U Sj 


where Si, Sj are the term vectors of tweets. 

Subtopic Diversity. Different tweets may associate with different aspects 
of the given topic. We use Probabilistic Latent Semantic Analysis (PLSA) JT5] 

2 https://github.com/lintool/twitter-tools/wiki/TREC-2013-API-Specifications 







Dynamic Information Filtering 


5 


to model implicit subtopics distribution of candidate tweets. Then we can define 
a kind of subtopic diversity feature based on the KL distance, as follows. 

P( z i\Si) = ^ E p (*i\SiW) 

where P(zi\Si,Wj) is calculated and saved in the E-step of the EM procedure. 

Based on these diversity features, we can obtain the diversity feature vector 
Rij = (Riji , Rij’i- Riji)- Please note that here we only list some representative 
diversity features used in our work, other useful diversity features can be easily 
adopted into the utility function. 


2.2 Dynamic Preservation Scheme based on Periodic Windows 



K-m 


m 


Fig. 1. Dynamic Preservation Scheme based on Periodic Windows 


Recency requirement of TDIF application contains two aspects of demand. 
The first is that users want to follow the recent information about a certain topic. 
Secondly, for continuous data stream, the efficiency of information processing 
must be high. 

Under the consideration of above two aspects, we propose a dynamic preser¬ 
vation scheme based on periodic time windows. Specifically, we segment the 
online data stream into disjoint periods in time units (or in number of items). 
Figure |T] is a simple example for illustration. The core idea of scheme contains 
several aspects as following: 

1. periodic time windows are disjoint and non-overlapped ; 

2. utilizing the utility function as described in equation [T| 

3. utilizing reliant local preservation scheme. Specifically, for each new time 
window, we preserve the top-{k — m ) items in prior result set, then utilize 
the utility function to select m new items reliant on the existing k — m items. 
In this way, we can maintain diversity of the final result set. 
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Algorithm 1 Dynamic Preservation Scheme based on Periodic Win¬ 
dows_ 

Input: Sa:,i-i - Result set with K items until time (t — 1) 

X ^ - The number of items in the new periodic time window 
Output: S K,t - Result set with K items until time t 
1: Initialize: S K,t top-(K-m) of Sx,t-i 
2: for i=l, m do 
3: bestDoc <— argmax IgA - t fs K ,t (*> R ) 

4: S K,t S K,t U bestDoc 

5: end for 

6: Sort S K,t by chronological order 

7: return S K,t 


The overall algorithm is described as Algorithm [T| When merging the old 
top-(fc-m) items and new m items into result set, we strictly display the results 
in chronological order, which is described as line 6 in Algorithm [T| It can be 
described as the freshness requirement of users in social media ) 151 , where users 
are used to follow released information in chronological order. 

The Algorithm [T] is with time complexity of 0(|X^| * to), 0 < to < K, and 
|A^| < ELo |X^| = N. Therefore, comparing with the traditional all batch 
mode which is with time complexity of 0(N * K), the dynamic mechanism will 
have better processing efficiency. On the other hand, m is a control parameter, 
which can flexibly control the “staleness” of the returned result set. For example, 
if to = K , the Algorithm [T] prefers to display the most recent information about 
the topic. 

3 Dynamic Diversity Evaluation Measures 

Topic-focused dynamic information filtering is a new application problem in 
current social media, which incorporates relevance, diversity, recency and confi¬ 
dence of information. Therefore, it is not easy to get a reasonable comprehensive 
evaluation for such a general task. 

In the current Microblog task of TREC, the corresponding task evaluation 
only focuses on retrieval relevance Emnu, and the detailed evaluation metrics 
are just the traditional MAP and P@K (2T]. While the diversity task of TREC 
Web track mm , the corresponding evaluation metrics take both relevance and 
diversity into consideration, which contain ERR-IA , a-NDCG and NRBP. 
However, these existing measures can not take factors of recency and confidence 
into consideration, and are also not proper for the evaluation of TDIF application 
problem. Based on the above analysis, we will propose a series of new dynamic 
diversity evaluation measures to get a more reasonable evaluation for TDIF task. 

Firstly we will review the existing diversity evaluation measures that are 
summarized in table[I] These measures have the same nature, and are different in 
some tiny components such as the way of position discounting. We find that there 
are 2 key points in these measures: diversity and the gain. The diversity means 
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Table 1. Summary of typical diversity measures 


diversity 

novelty 

gain 

discount 

measure 

c _ PiSt 

* - - 77 - 



D k = log(k + 1) 

a-NDCG 

C Qi 

— 2sk — 1 D k 

or simplified to 

II 

ERR-IA 


Qi = gH 1 - «) cJ 

D k = {l/fif- 1 

NRBP 


subtopic (or aspect) coverage, which is based on explicit subtopic information of 
a query. Specific to a certain subtopic, the gain describes redundancy penalizing 
and position discounting when accumulating the relevance in every rank. We 
take a-NDCG for example, a-NDCG is formulated as follows: 


M 


K 


*-ndcg = ^J2p>J2 


k=1 


aH i-q) c ? 

log 2 (k + 1) 


where g!? is a binary relevance value for document at postion k with respect to 
subtopic i, a is a constant belong to ( 0 , 1 ], c* = ]Cj=i 9i > which is the number 
of documents ranked before position k that are judged relevant to subtopic i, K 
is the number of documents in a ranking list, M is the number of subtopics, pi 
is the probability of each subtopic, and A/" is a normalization factor. 

We incorporate recency and confidence factors into existing diversity evalu¬ 
ation measures such as a-NDCG , and then propose a new dynamic diversity 
evaluation measure d-NDCG as follows: 


d-NDCG = 


M 

vS 

2=1 


Pi 


K 

E 

*:=i 


'y trc v * g¥( 1 — a) c i * u r 
log 2 {k + 1) 


( 2 ) 


and 

trey — topic.timestamp — tweet.timestap 

where topic.timestamp means the current time of topic tracking, tweet.timestap 
means the released time of tweet information. 7 is the corresponding trade-off 
parameter, 0 < 7 < 1. we set 7 = 0.5 in our following experiment. 7 tro « part 
measures the recency of information. u r measures the confidence of information 
via the way of user account weight mm- 

Based on the definition of d-NDCG , we find that the final evaluation score 
of each items is depended on several factors: recency, relevance, diversity and 
confidence. When in real application, we usually need to rescale the value of 
t rcy and u r upon the scale of relevance label gk. For example, the public twitter 
dataset in TREC Microblog task has three grade label: 2 (relevant), 1 (partly 
relevant) and 0 (not relevant). When in following experimental evaluation, we 
can simply rescale t rcy into three grade label: 2 (i.e. history), l(i.e. recent) and 
0(i.e. latest) based on a certain threshold, and rescale u r into three grade label: 3 
(i.e. significant user account), 2 (i.e. important user account) and 1 (i.e. normal 
user account). 
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Similarly, we can give the corresponding definition of d-ERR and d-NRBP, 
and simply replace the “gain” component in table[l]with *g^(l — a) c i *u r , 
formalized as follows. 



( 3 ) 


2—1 k=1 



4 Experiments 

In this section, we will evaluate the TDIF task from different aspects. We first 
describe the experimental setup that includes dataset, evaluation metrics and 
baseline methods. Then we conduct extensive automatic evaluation for our ap¬ 
proach and baseline strategies. Finally, we conduct manual evaluation for further 
analysis. 

4.1 Experimental Setup 

Here we give some introductions on the experimental setup, including data col¬ 
lections, evaluation metrics and baseline methods. 

Data Collections We use the public twitter dataset in Microblog task of TREC 
2011 and TREC 2012, which has approximately a sample of 16M tweets, ranging 
over a period of 16 days. TREC 2011 provides 50 test topics, and TREC 2012 
provides 60 test topics. 

In our experiments, we only preserve English tweet data, and apply porter 
stemmer for tweet information and test topics. Based on the consideration of 
“short text” of Microblog, we do not apply stopwords removing to avoid infor¬ 
mation loss. We use Indri toolkit (version 5.2 jf] as the basic retrieval platform. We 
also utilize the twitter APfl provided by TREC2013 to retrieval several features 
such as the number of followers and retweet number. We conduct query expansion 
by pseudo relevance feedback and external expansion via Google search engin^l, 
which aims to obtain more aspects of test topic for covering more information. 

Evaluation Metrics We will evaluate all the methods from two aspects of 
effectiveness and efficiency. For effectiveness, we first utilize representative di¬ 
versity measure a-NDCG^l, and then utilize the proposed dynamic diversity 
measure d-NDCG. For a-NDCG and d-NDCG , the cutoff is set as K = 20. 

3 http://lemurproject.org/indri 

4 https://github.com/lintool/twitter-tools 

5 http://google.com 
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No matter a-NDCG or d-NDCG , they all need relevance label at subtopic 
level, while the current public dataset has not provided such information. There¬ 
fore, we do further manual relevance labeling at subtopic level, on the basis of 
existing all the relevant tweets. The labeling method is simple, for each relevant 
tweet, we judge whether it cover different subtopics comparing with prior rel¬ 
evant tweets. If yes, we will think it is relevant with a new subtopic. We label 
2955 relevant tweets for 49 test topics in total for TREC 2011, and label 6286 
relevant tweets for 60 test topics in total for TREC 2012. On average there are 
3.6 subtopics per test topic 

For efficiency, we mainly utilize the average processing time of different meth¬ 
ods for each test topic. 


Baseline Methods The R-LTR has been proved to be state-of-the-art di¬ 
verse ranking methods. Therefore, in topic-focused information filtering task, we 
mainly focus on strategy comparison but not the detailed ranking models (or 
utility function). The typical baseline methods are shown as follows: 

— AlLold. AlLold strategy means the original R-LTR method optimized for 
traditional diversity measures such as a-NDCG, and then in each new time 
point, it will rank all the candidate items in a batch way. 

— AlLnew. AlLnew strategy denotes the R-LTR method optimized for new 
dynamic diversity measure such as d-NDCG , and rank all the candidate 
items in each time point. 

— TopRel. This method will select K most relevant items in each new periodic 
time window. Specifically, it will use ListMLE method [29] as utility function, 
and display result in chronological order. This method does not consider the 
requirement of diversity, which is similar to the way used in industry. 

Our proposed “Dynamic reliant local Preservation scheme” is denoted as 
“DP”, which is based on the R-LTR utility function optimizing for a-NDCG. 
If no special statement, the default value of parameter m will be set as 10. 

For proper evaluation, we choose ‘2 days’ as a time unit, due to that there 
are not enough relevant tweets for each test topic in our dataset if we choose 
smaller time window size less than 2 days. Here we must state clearly that we 
can choose any proper window size based on the real application scenario. 

We utilize the tweet data in first two days as training data, for utility func¬ 
tion ListMLE and R-LTR, the detailed training process can be referred to the 
corresponding literature m . 


4.2 Evaluation on Traditional Diversity Measure 

We first utilize traditional diversity measure a-NDCG for evaluation, and the 
detailed result is shown as figure [2] The horizontal axis means different time 
points in chronological order, and vertical axis denotes corresponding a-NDCG 


score. 
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—AII_old 
-»-AII_new 
—TopRel 
—DP 


Fig. 2. Performance comparison on a-NDCG measure 


From the figure, we can observe that AlLold performs best, which is in accor¬ 
dance with our intuition. AlLnew also performs worse than AlLold due to opti¬ 
mizing for new diversity measure. AlLBatch strategies (i.e., AlLold and AlLnew) 
will rank all the candidate items in each time points. Therefore, they perform 
better than two other approaches. Our DP approach shows less but approxi¬ 
mate performance comparing with AlLBatch strategy. In fact, DP method can 
be viewed as an approximation of AlLold under online data stream scenario. 
It can capture more recency factors with the sacrifice of little performance on 
a-NDCG. TopRcl performs worse because it only consider relevance require¬ 
ment. It can be applied easily, and used normally in industry filed. 


4.3 Evaluation on Dynamic Diversity Measure 

The d-NDCG is a new dynamic diversity measure, which also takes recency and 
confidence factors into consideration besides traditional relevance and diversity 
factors. Then we utilize d-NDCG for further evaluation. The evaluation result 
is shown as figure [3] 

We can see that the proposed DP performs best among all baseline meth¬ 
ods. Although optimizing directly for d-NDCG measure, AlLnew still performs 
worse than DP strategy, which enforces capturing more recency factor based on 
time periodic window scheme. Combing with the results in figure [2l AlLold and 
AlLnew perform better under each optimizing diversity measure. TopRel per¬ 
forms worst in all baselines, which is also consistent with the evaluation results 
in figure [2] 

Overall, our proposed DP strategy shows better performance on d-NDCG 
measure, which means our approach is more suitable for topic-focused dynamic 
information filtering task. 
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Fig. 3. Performance comparison on d-NDCG measure 


4.4 Efficiency Evaluation 


70 

60 

50 

40 
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12345678 


Fig. 4. Average processing time of topic-focused information filtering (unit: millisec¬ 
ond) 


An important requirement of the TDIF task is the processing efficiency for 
online data stream. Therefore, we will conduct efficiency evaluation with average 
processing time of each test topic. 

The evaluation results are shown as figure [2 Here we use ‘All’ denotes both 
AlLold and AlLnew strategy since they are nearly with same efficiency. We can 
see that AlLBatch strategy has lowest efficiency, because it will process all the 
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12345678 12345678 

Fig. 5. Parameter sensitivity analysis of m\ (a) Evaluation on d-NDCG measure; (b) 
Average Processing time 


candidate items at each time point. The DP strategy shows much higher effi¬ 
ciency than AlLBatch way, which is also consistent with the theoretical analysis 
in section 2.2. It will choose m items in a candidate set with relatively small size 
at each time point. 

TopRel shows lower efficiency than DP, but higher than AlLBatch. Because 
it will choose 20 items in each time windows, and performs slower than Periodic 
approach (default m = 10). In fact, TopRel method drops the consideration of 
diversity relations, so it will perform faster than DP approach when m = 20, 
which will be proved in the following evaluation of parameter m sensitivity. 

4.5 Parameter Sensitivity 

In our DP approach, the parameter m (0 < m < K) control the “staleness” of 
the result set. In this subsection, we will evaluate its effect from two aspects of 
d-NDCG and efficiency. 

We choose three situations of m = 5, m = 10 and m = 20. The evaluation 
result is shown as figure[5] From the performance of d-NDCG (i.e. subfigure (a)), 
we can find that the case of m = 10 performs best, and then followed with m = 20 
and m = 5. Form the aspect of efficiency (i.e. subfigure (b)), the case of m = 5 
performs best, and then followed with m = 10 and m = 20. Therefore, based on 
the analysis of two aspects, m = 10 will have better comprehensive performance, 
which is also set as default parameter value. 

Additionally, when m = 20, its processing time is during 20-25 milliseconds, 
which is slower than TopRel method (its average processing time is about 20 
milliseconds, from figure 0]), due to the consideration of diversity relations. 


5 Related Work 

Most existing research work all treats the problem of diverse ranking as a ‘static 
subset problem’ Specifically, they will try to find optimal or 

suboptimal subset on a static data set. With the development of new social 
media such as twitter or sina weibo in china, the ranking scenario has changed. 
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In this new scenario, new information will be continuously released online as a 
data stream, and how to process stream information effectively has become a 
new challenging problem. 

The research work on the scenario of data stream is little, and several rep¬ 
resentative research work is [13I23HK] . Drosou et al. m do some heuristic at¬ 
tempt on “ publish/subscribe” scenario. Specifically, they give the definition of 
‘diversity on sliding window’, then utilize the classical “Max-Sum” object m 
as utility function, to conduct heuristic greedy strategy. The idea of this work 
also inspires their following research work [15| , which further focuses on the high 
efficient computing of dynamic diversity via an indexing scheme of “cover tree”. 
It can support high efficient update operation such as inserting and deleting. 
Mninack et al. give the definition of “incremental diversity”. In their work, they 
can maintain a near optimal diverse set at any point in the data stream. The 
authors also utilize classical “Max-Sum” or “Max-Min” object as their utility 
function, to conduct heuristic interchange scheme. For each new items, it will 
make decision of discard or insert, to improve the diversity of the result set. 

With the rise of social media, there are many related research work on social 
media. Chen et al. m discuss and analyze content recommendation in twitter 
from several feature dimensionality. Hong et al. focus on how to build effective 
systems for ranking social updates from a unique perspective of Linkedln. They 
leverage ideas from information retrieval and recommender systems, which has 
shown promising performance. Clioudhury et al. m focus on the research of 
topic retrieval in twitter, to obtain the most relevant results. However, their 
work is still limited to search scenario, which is almost same as traditional Web 
search. 

Overall, comparing with prior research work, our work has shown several 
differences as follows: (1) the research problem is different, our work aims to 
tackle the topic-focused dynamic information filtering in social media, which is 
a new application problem; (2) our detailed approach also shows many differ¬ 
ences. We utilize different utility function - R-LTR ranking model, which is a 
supervised feature-based ranking model with good adaptation to different ap¬ 
plication scenario. Our dynamic preservation scheme also shows difference with 
prior work. 

6 Conclusions 

In this paper, we investigate the problem of topic-focused dynamic information 
filtering in social media. Firstly we analyze the properties of the application 
problem, which has several typical requirements: relevance, diversity, recency 
and confidence. In this scenario, how to balance these factors properly is very 
important. Then we propose to utilize the relational learning-to-rank model, and 
combine with dynamic preservation scheme based on periodic time windows, to 
solve the TDIF problem. In this way, we can capture these ranking factors effec¬ 
tively. Due to the new requirements of TDIF problem, we propose new dynamic 
diversity evaluation measures to get a more reasonable evaluation for such ap- 
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plication problem, which can take recency and confidence factors into consider¬ 
ation on the basis of relevance and diversity. We conduct extensive automatic 
and manual evaluation on public TREC twitter dataset, and the experimental 
results prove the effectiveness of our approach. 

Overall, we present a completed investigation of a typical application prob¬ 
lem in social media, which contains the analysis, solution and evaluation of the 
problem. Our work shed some light on the TDIF problem, which is significative 
for future research work. 
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Abstract. With the development of online social media such as twitter, 
many users usually track hot topics to satisfy their desired information 
need. For a hot topic, new opinions or ideas will be continuously produced 
in the form of online data stream. In this scenario, how to effectively filter 
and display information dynamically will be a critical problem. We call 
the problem as Topic-focused Dynamic Information Filtering (denoted 
as TDIF for short) in social media. In this paper, we start open discus¬ 
sions on such application problems. We first analyze the properties of 
the TDIF problem, which usually contains several typical requirements: 
relevance, diversity, recency and confidence. Recency means that users 
want to follow the recent opinions or news, and the confidence of in¬ 
formation must be also taken into consideration. How to balance these 
factors properly is very important and challenging. We propose a dy¬ 
namic preservation strategy on the basis of an existing feature-based 
utility function, to solve the TDIF problem. Additionally, we propose 
new dynamic diversity measures, to get a more reasonable evaluation for 
such application problems. Extensive exploratory experiments have been 
conducted on TREC public twitter dataset, and the experimental results 
validate the effectiveness of our approach. 

Keywords: Data Stream, Utility Function, Dynamic Preservation Scheme 


1 Introduction 

The development of new social media such as twitter accelerates the spread of 
online information. In the social media, new information will be continuously 
produced in the form of online data stream. For a hot topic, how to filter and 
display relevant information dynamically will be a critical problem, which can 
be called as Topic-focused Dynamic Information Filtering in social media. 

The TDIF problem has three typical requirements: relevance , diversity and 
recency. The relevance requires the tweet information must be relevant to the 
topic. The diversity requires result set can describe the topic from different 
aspects with little redundancy. Recency means that users want to follow the 
recent opinions or news quickly. Additionally, the human factor also affects the 
confidence of the tweet information. Therefore, how to balance these critical 
factors becomes a new challenging problem. 

In fact, little prior research work has been done to tackle the TDIF problem. 
Most existing work only focuses one or two factors in information retrieval, 
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such as pure relevance mm, or pure diversity [251 , or relevance combing with 
diversity mm- Even in the industry field, such problem has also been not 
solved well. They usually only consider relevance and ignore diversity or recency. 

In this paper, We utilize the relational learning-to-rank model (R-LTR for 
short) [251 as utility function, and ccombine with the dynamic preservation 
scheme based on periodic time windows, to solve the TDIF problem. R-LTR 
model is the state-of-the-art diverse ranking method, which models the diversity 
relations among documents in the ranking process, besides the content informa¬ 
tion of individual documents. It is a flexible feature-based ranking model with 
good adaptation to different application scenario. Although R-LTR model can 
tackle relevance and diversity well, yet it is limited in the static dataset, and its 
efficiency can hardly satisfy the scenario of online data stream. 

Therefore, we propose the dynamic preservation scheme based on the R-LTR 
model for proper solution. Specifically, we segment the data stream into disjoint 
periods with time length T (segmentation granularity can be days or hours 
depending on detailed requirements). For each new time window, we preserve 
the top-(A; — m) most relevant results previously, then utilize the R-LTR ranking 
function to select new m relevant results, and finally display all the k results in 
chronological order. Here the parameter m can flexibly control the “staleness” 
of the returned results. 

Due to the requirements of TDIF application problem, we also propose new 
dynamic diversity measures to get a more reasonable evaluation. We introduce 
the recency factor and confidence factor into existing popular diversity evaluation 
measures (i.e. ERR- JAPE], a-NDCG® and N RBP^j\.). Then we get a series 
of dynamic diversity evaluation measures: d-ERR , d-NDCG and d-NRBP. 

We conduct extensive evaluations on public TREC twitter dataset, and the 
experimental results show that our approach can achieve promising performance 
on both traditional diversity measures and new dynamic diversity measures. 
Meanwhile, our approach is also with high processing efficiency. 

The rest of the paper is organized as follows. Section 2 introduces our ap¬ 
proach for TDIF problem. Section 3 introduces the new dynamic diversity mea¬ 
sures. Section 4 presents the experimental results. Section 5 describes related 
work and Section 6 concludes the paper. 

2 Our Approach 

The TDIF problem in social media has several typical requirements: relevance, 
diversity, recency and confidence. Therefore, the basic motivation of our ap¬ 
proach is how to effectively capture and balance these requirements. In this sec¬ 
tion, we will describe our strategy detailcdly, which mainly contains two parts: 
the basis utility function and the dynamic strategy. 

2.1 Utility Function 

The R-LTR model can effectively solve the diverse ranking problem in static 
dataset scenario, which models both relevance and diversity properly. As de- 
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scribed in the literature [25], the score of a candidate document contains two 
parts: relevance score based on content information of individual documents, 
and diversity score based on the relationship between the current document and 
those previously selected. We use A' denotes all the candidate documents, S de¬ 
notes previously selected documents, and X\S denotes the remanent documents. 
The score function can be formalized as follows. 


fs(xi,Ri ) = + ujJhs(Ri),Vxi G X\S (1) 

where x,; denotes the relevance feature vector of the candidate document x, . Ri 
stands for the matrix of relationships between document x, L and other selected 
documents, with each R,j 3 stands for the diversity feature vector between docu¬ 
ment Xi and Xj , represented by the feature vector of (Riji , ■ • • , Riji ), Xj G S, and 
Rijk stands for the fc-th diversity feature between documents x t and Xj. hs(Ri) 
stands for the relational function on Ri, uij and wj stands for the corresponding 
relevance and diversity weight vector. 

The relational function hs(Ri) denotes the way of representing the diversity 
relationship between the current document Xi and the previously selected doc¬ 
uments in S. It can be defined in three ways: Minimal, Average and Maximal. 
Here we choose the Minimal way, defined as follows. 

hs(Ri) = (nrin Riji, • • • , min Riji). 

Xj€S XjES 

The R-LTR is a flexible feature-based ranking function, which has good adap¬ 
tation to social media scenario and can be chosen as our basis utility function. 
Comparing with other heuristic definitions of utility function such as “Max-Sum” 
or “Max-Min” 116113120] , we can obtain a more reasonable basis utility function 
by supervised learning. When in real application, we need define and utilize 
specific relevance and diversity features close related to social media scenario. 

Relevance Feature Vector x;. For relevance feature vector, we can utilize tra¬ 
ditional learning-to-rank relevance features, such as Weighting Models including 
typical TF-IDF, BM25 and language model. 

Additionally, we also utilize some specific features in twitter, shown as follows. 

— Recency. We take the tweet released time into consideration, and prefer more 
recent information. 

— UserRank. The importance of a user account can measure the confidence of 
information, which can be simply obtained via the followers of user. 

— Retweet Number. If a tweet is retweet many times, it is usually with high 
importance. 


Diversity Feature Vector Rij For diversity features, we utilize typical seman¬ 
tic diversity features such as Cosine Diversity {Riji), Jaccard Diversity {Rij 2 ) 
features, and Subtopic Diversity {Rijz). For subtopic diversity, we use Proba¬ 
bilistic Latent Semantic Analysis (PLSA) [17 to model implicit subtopics dis¬ 
tribution of candidate tweets. Then we can define a kind of subtopic diversity 
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feature based on the KL distance, as follows. 


Rij 3 — 


E 

Zi£Z 


P(zi\Si)log 


P(zj\Sj) 

P( z i\Sj) 


P(zi\Si) 


1 


E p ( z i\Sz,wj) 

Wj £Si 


where P(zi\Si,Wj) is calculated and saved in the E-step of the EM procedure. 

Based on these diversity features, we can obtain the diversity feature vector 
Rij = (Riji , R.ij'i- Riji)- Please note that here we only list some representative 
diversity features used in our work, other useful diversity features can be easily 
adopted into the utility function. 


2.2 Dynamic Preservation Scheme based on Periodic Windows 


T_o 


T_c-i 

T_c 


f— 

K-m 
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Fig. 1. Dynamic Preservation Strategy 


For TDIF application problem, social users want to follow the recent infor¬ 
mation on a certain topic. Meanwhile for continuous online data stream, the effi¬ 
ciency of information processing must be high. Therefore, we propose a dynamic 
preservation scheme based on periodic time windows. Specifically, we segment 
the online data stream into disjoint time units. Figure |T] is a simple illustration. 
The strategy contains two key points as follows: 

1. Periodic time windows are disjoint and non-overlapped. 

2. Utilize reliant local preservation scheme. For each new time window, we 
preserve the top-(k — m ) items in prior result set, and then utilize the utility 
function to select m new items reliant on the existing k — m items. In this 
way, we can maintain diversity of the final result set. 

The approach can be described as Algorithm [I] When merging the old top- 
( k-m ) items and new m items into the final result set, we strictly display the 
results in chronological order, which is described as line 6 in Algorithm [lj since 
social users are used to follow released information in chronological order m- 
The Algorithm [T| is with time complexity of 0(\X^\ * to), 0 < m < K, 
and <C X^t=o = A". Comparing with the traditional all batch mode 

which is with time complexity of 0(N * K), the dynamic preservation stragety 
will have much higher processing efficiency, to is a control parameter, which can 
flexibly control the “staleness” of the result set. For example, if to = K , the 
Algorithm [I] prefers to display the most recent information about the topic. 
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Algorithm 1 dynamic preservation scheme based on periodic windows 
Input: Sif,i-i - Result set with K items until time (t — 1) 

X ^ - The items set in the new periodic time window 
Output: Sjc,( - Result set with K items until time t 
1: Initialize: S K,t t— top-(K-m) of SK,t-i 
2: for i=l, m do 
3: bestDoc <— argmax^g^p fs Ktt (*> R) 

4: S K,t t— S K,t U bestDoc 

5: end for 

6: Sort Sjf,t by chronological order 

7: return S K,t 


Table 1. Summary of typical diversity measures ;7 


diversity 

novelty 

gain 

discount 

measure 

c _ pA 

* - - 77 - 



D k = log(k + 1) 

a-NDCG 

c Qz 

— 2^k=l D k 

or simplified to 

II 

ERR-IA 

1-aH 

D k = (t/d)"- 1 

NRBP 


3 Dynamic Diversity Evaluation Measures 


Topic-focused dynamic information filtering is a new application problem in so¬ 
cial media, which incorporates relevance, diversity, recency and confidence of 
information. Therefore, it is not easy to get a reasonable comprehensive evalua¬ 
tion for such a general task. 

In the current Microblog task of TREC, the corresponding task evaluation 
only focuses on retrieval relevance EH23H3I, and the detailed evaluation metrics 
are just the traditional MAP and P@K. While the diversity task of TREC Web 
track HU, the corresponding evaluation metrics take both relevance and diversity 
into consideration, which contain ERR-IA , a-NDCG and NRBP. However, 
these existing measures do not take recency and confidence into consideration, 
and are not proper for the evaluation of TDIF application problem. Based on 
the above analysis, we will attempt to propose a series of new dynamic diversity 
evaluation measures to get a more reasonable evaluation for TDIF task. 

Firstly we will review the existing diversity evaluation measures that are sum¬ 
marized in table [l] These measures are different only in some tiny components 
such as the way of position discounting. We find that there are 2 key points in 
these measures: diversity and the gain. The diversity means subtopic (or aspect) 
coverage, which is based on explicit subtopic information of a query. Specific to 
a certain subtopic, the gain describes redundancy penalizing and position dis¬ 
counting when accumulating the relevance in every rank. We take a-NDCG for 
example, and it is formulated as follows: 


M K 


*-NDCG = jfJ2p*J2 


2=1 k— 1 


9ii l ~ u) c * 

log 2 (k + 1) 
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where gf is a binary relevance value for document at postion k with respect to 
subtopic i, a is a constant belong to ( 0 , 1 ], c j = ]T)j = i ( Ji- which is the number 
of documents ranked before position k that are judged relevant to subtopic i, I\ 
is the number of documents in a ranking list, M is the number of subtopics, pi 
is the probability of each subtopic, and M is a normalization factor. 

We incorporate recency and confidence factors into existing diversity evalu¬ 
ation measures such as a-NDCG , and then propose a new dynamic diversity 
evaluation measure d-NDCG as follows: 


1 


K 

d-NDCG = 

i —1 k —1 


M 

\' 


7 trcy * gf (1 — a) c i * u r 

log 2 {k + 1 ) 


( 2 ) 


and 

t rC y = topic.timestamp — tweet.timestap 

where topic.timestamp means the current time of topic tracking, tweet.timestap 
means the released time of tweet information. 7 is the corresponding trade-off 
parameter, 0 < 7 < 1. we set 7 = 0.5 in our following experiment. ^ trcy part 
measures the recency of information. u r measures the confidence of information 
via the weight of user account [1514] . 

When in real application, we usually need to rescale the value of t rcy and u r 
upon the scale of relevance label < 7 *. For example, the public twitter dataset in 
TREC Microblog task has three grade labels: 2 (relevant), 1 (partly relevant) and 
0 (not relevant). When in following experimental evaluation, we simply rescale 
t rcy into three grade labels: 2 (i.e. history), l(i.e. recent) and 0 (i.e. latest) based 
on a certain threshold, and rescale u r into three grade labels: 3 (i.e. significant 
user), 2 (i.e. important user) and 1 (i.e. normal user). 

Similarly, we can give the corresponding definitions of d-ERR and d-NRBP, 
and simply replace the “gain” component in table[l]with * < 7^(1 — a) c i * u r . 


4 Experiments 

In this section, we will evaluate the TDIF task from different aspects. We first 
describe the experimental setup that includes dataset, evaluation metrics and 
baseline methods. Then we conduct extensive automatic evaluation for our ap¬ 
proach and baseline strategies. 


4.1 Experimental Setup 

Data Collections We use the public twitter dataset in Microblog task of TREC 
2011 and TREC 2012, which has approximately a sample of 16M tweets, ranging 
over a period of 16 days. TREC 2011 provides 50 test topics, and TREC 2012 
provides 60 test topics. 

In our experiments, we only preserve English tweet data, and apply porter 
stemmer for tweet information and test topics. Based on the consideration of 
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“short text” of Microblog, we do not apply stopwords removing to avoid infor¬ 
mation loss. We use Indri toolkit (version 5. 2(0 as the basic retrieval platform. We 
also utilize the twitter AP^| provided by TREC2013 to retrieval several features 
such as the follower number and retweet number. 


Evaluation Metrics We will evaluate all the methods from two aspects of 
effectiveness and efficiency. For effectiveness, we first utilize representative di¬ 
versity measure a-NDCG^\, and then utilize the proposed dynamic diversity 
measure d-NDCG. For a-NDCG and d-NDCG , the cutoff is set as K = 20. 

No matter a-NDCG or d-NDCG , they all need relevance label at subtopic 
level, while the current public dataset has not provided such information. There¬ 
fore, we do further manual relevance labeling at subtopic level, on the basis of 
existing all the relevant tweets. The labeling method is very simple, for each rel¬ 
evant tweet, we judge whether it cover different subtopics comparing with prior 
relevant tweets. If yes, we will think it is relevant with a new subtopic. We label 
2955 relevant tweets for 49 test topics in total for TREC 2011, and label 6286 
relevant tweets for 60 test topics in total for TREC 2012. On average, there are 
3.6 subtopics under each test topic. 

For efficiency, we measure the average processing time for each test topic. 


Baseline Methods The R-LTR has been proved to be state-of-the-art diverse 
ranking method. Therefore, in TDIF task, we mainly focus on strategy com¬ 
parison but not the detailed ranking models (or utility functions). The typical 
baseline methods are shown as follows: 

— AlLold. AlLold strategy means the original R-LTR method optimized for 
traditional diversity measure a-NDCG , and it will rank all the candidate 
items in each new time point. 

— All_new. AlLnew strategy is similar to AlLold, and it utilizes the R-LTR 
method optimized for new dynamic diversity measure d-NDCG. 

— TopRel. This method will select K most relevant items in each new periodic 
time window. Specifically, it will use ListMLE method [M] as utility function, 
and display result in chronological order. This method does not consider the 
requirement of diversity, which is similar to the way used in industry. 

Our proposed “Dynamic Preservation scheme” is denoted as “DP”, which 
is based on the R-LTR utility function optimizing for a-NDCG. If no special 
statement, the default value of parameter m will be set as 10. 

For proper evaluation, we choose ‘2 days’ as a time unit, If we choose smaller 
window size less than 2 days, there will be not enough relevant tweets for each 
test topic in our dataset. In fact, we can choose any proper window size when 
in real application scenario. 

1 http://lemurproject.org/indri 

2 https://github.com/lintool/twitter-tools 



We utilize the tweet data in first two days as training data for utility func¬ 
tions ListMLE and R-LTR, the detailed training process can be referred to the 
corresponding literature [24126] . 

4.2 Evaluation on Traditional Diversity Measure 



Fig. 2. Performance comparison on a-NDCG measure 


We first utilize traditional diversity measure a-NDCG for evaluation, and 
the detailed result is shown as figure [2] The horizontal axis means different time 
points in chronological order, and vertical axis denotes a-NDCG score. 

From the figure, we can observe that AlLold performs best, which is in accor¬ 
dance with our intuition. AlLnew performs worse than AlLold due to optimizing 
for new diversity measure. AlLBatch strategies (i.e., AlLold and AlLnew) will 
rank all the candidate items in each time points. Therefore, they perform better 
than two other approaches. Our DP approach shows less but approximate per¬ 
formance comparing with AlLBatch strategy. In fact, DP method can be viewed 
as an approximation of AlLold in online data stream scenario. It can capture 
recency better with the sacrifice of a little performance on a-NDCG. TopRel 
performs worst because it only consider relevance requirement, while it can be 
applied easily and used normally in industry filed. 


4.3 Evaluation on Dynamic Diversity Measure 

The d-NDCG takes recency and confidence into consideration besides relevance 
and diversity. The evaluation result on d-NDCG is shown as figure [3] 

We can see that the proposed DP approach performs best among all base¬ 
line methods. Although optimizing directly for d-NDCG measure, AlLnew still 
performs worse than DP. Because DP strategy enforce capturing more recency 
based on periodic time window. Combing with the results in figure [2 AlLold 
and AlLnew all perform better under their corresponding optimizing diversity 
measures. TopRel performs worst in all baselines, which is also consistent with 
the evaluation results in figure [2 
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Fig. 3. Performance comparison on d-NDCG measure 


4.4 Efficiency Evaluation 
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Fig. 4. Average processing time for each topic (unit: millisecond) 


An important requirement of the TDIF task is the processing efficiency for 
online data stream. Therefore, we will conduct efficiency evaluation with average 
processing time of each test topic. 

The evaluation results are shown as figure [2 Here we use ‘All’ denotes both 
AlLold and AlLnew strategies since they are with nearly same efficiency. We can 
see that AlLBatch strategy is with the lowest efficiency, because it processes all 
the candidate items at each time point. The DP strategy shows much higher effi¬ 
ciency than AlLBatch way, which is also consistent with the theoretical analysis 
in section 2.2. It will choose m items in a candidate set with relatively small size 
at each time point. 

TopRel shows lower efficiency than DP, but higher than AlLBatch. Because it 
will choose 20 items in each time window, and perform slower than DP approach 
(default m = 10). In fact, TopRel method drops the consideration of diversity 
relations, so it will perform faster than DP approach when m = 20, which will 
be proved in the following evaluation of parameter m sensitivity. 
























10 




12345678 12345678 

Fig. 5. Parameter sensitivity analysis: (a) d-NDCG] (b) Average Processing time 

4.5 Parameter Sensitivity 

In our DP approach, the parameter m (0 < m < K ) controls the “staleness” of 
the result set. In this subsection, we will evaluate its effect from two aspects of 
d-NDCG and efficiency. 

We choose three situations of m = 5, to = 10 and m = 20. The evaluation 
result is shown as figure[5] From the performance of d-NDCG (i.e. subfigure (a)), 
we can find that the case of m = 10 performs best, and then followed with to = 20 
and to = 5. Form the aspect of efficiency (i.e. subfigure (b)), the case of in = 5 
performs best, and then followed with to = 10 and to = 20. Therefore, based on 
the analysis of two aspects, to = 10 will have better comprehensive performance, 
which is also set as default parameter value in our work. 

Additionally, when m = 20, its processing time is during 20-25 milliseconds, 
which is slower than TopRel method (its average processing time is about 20 
milliseconds described in figure|U), due to the consideration of diversity relations. 

5 Related Work 

Most existing research work studies the problem of diverse ranking in a static 
dataset scenario |1 II 012211 II . They try to find optimal or suboptimal subset of 
a static data set. With the development of new social media such as twitter, 
the ranking scenario has changed. In this new scenario, new information will 
be continuously released online as a data stream, and how to process stream 
information effectively has become a new challenging problem. 

The research work on the scenario of dynamic data stream is little, and several 
representative research work is H3EDG31- Drosou et al. m do some heuristic 
attempt on “ publish/subscribe” scenario. Specifically, they give the definition 
of ‘diversity on sliding window’, and utilize the classical “Max-Sum” object m 
as utility function to conduct heuristic greedy strategy. The idea of this work 
also inspires their following research work 0 . which further focuses on the high 
efficient computing of dynamic diversity via an indexing scheme of “cover tree”. 
It can support high efficient update operation such as inserting and deleting. 
Mninack et al. give the definition of “incremental diversity”. In their work, they 
maintain a near optimal diverse set at any point in the data stream. The authors 
utilize classical “Max-Sum” and “Max-Min” objects as their utility functions, to 
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conduct heuristic interchange scheme. For each new coming item, it will make 
decision of discard or insert, to maintain diversity of the result set. 

With the rise of social media, there are many research work on social me¬ 
dia. Chen et al. m discuss and analyze content recommendation in twitter 
from several feature dimensionalities. Hong et al. focus on how to build effec¬ 
tive systems for ranking social updates of Linkedln. They leverage ideas from 
information retrieval and recommender system, which has shown promising per¬ 
formance. Choudhury et al. m focus on the research of topic retrieval in twitter, 
to obtain the most relevant results. However, their work is still limited to search 
scenario, which is almost same as traditional Web search. 

Overall, comparing with prior research work, our work has shown several 
differences as follows: (1) The research problem is different. Our work aims to 
tackle the topic-focused dynamic information filtering in social media, which is a 
new application problem; (2) Our detailed approach also shows many differences. 
We utilize different utility function (i.e. R-LTR), which is a supervised feature- 
based ranking model with good adaptation to different application problems. 
Our dynamic preservation scheme also shows differences with prior work. 

6 Conclusions 

In this paper, we investigate the problem of topic-focused dynamic information 
filtering in social media. Firstly we analyze the properties of the application 
problem, which has several typical requirements: relevance, diversity, recency 
and confidence. In this scenario, how to balance these factors properly is very 
important. Then we propose to utilize the R-LTR model, and combine with dy¬ 
namic preservation scheme based on periodic time windows, to solve the TDIF 
problem. In this way, we can capture these factors effectively. Due to the new 
requirements of TDIF problem, we propose new dynamic diversity measures to 
get a more reasonable evaluation for such application problems, which can take 
recency and confidence into consideration besides relevance and diversity. We 
conduct extensive evaluations on public TREC twitter dataset, and the experi¬ 
mental results prove the effectiveness of our approach. 

Overall, we present a completed investigation of a typical application problem 
in social media, which contains the problem analysis, solution and evaluation. 
Our work shed some light on such application problem, which is significative for 
future research work. 

References 

1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. 
In Proceedings of the 2th ACM WSDM, pages 5-14, 2009. 

2. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for 
reordering documents and producing summaries. In Proceedings of the 21st ACM 
SIGIR, pages 335-336, 1998. 

3. O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for 
graded relevance. In Proceedings of the 18th ACM CIKM , pages 621-630, 2009. 


12 


4. C. Chen, F. Li, B. C. Ooi, and S. Wu. Ti: An efficient indexing mechanism for 
real-time search on tweets. In Proceedings of the SIGMOD , pages 649-660, 2011. 

5. J. Chen, R. Nairn, and E. Chi. Speak little and well: Recommending conversations 
in online social streams. In Proceedings of the SIGCHI, pages 217-226, 2011. 

6. J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. Short and tweet: Experi¬ 
ments on recommending content from information streams. In Proceedings of the 
SIGCHI Conference, pages 1185-1194, 2010. 

7. C. L. Clarke, N. Craswell, I. Soboroff, and A. Ashkan. A comparative analysis 
of cascade measures for novelty and diversity. In Proceedings of the fth WSDM, 
pages 75-84, 2011. 

8. C. L. Clarke, M. Kolia, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Biittcher, 
and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In 
Proceedings of the 31st ACM SIGIR, pages 659-666, 2008. 

9. C. L. Clarke, M. Kolia, and O. Vechtomova. An effectiveness measure for ambigu¬ 
ous and underspecified queries. In Proceedings of the 2nd ICTIR, 2009. 

10. K. Collins-Thompson, P. Bennett, F. Diaz, C. L. Clarke, and E. M.Voorhees. 
Overview of the tree 2011 web track. In TREC, 2011. 

11. V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach 
to search result diversification. In Proceedings of the 35th ACM SIGIR, 2012. 

12. M. De Choudhury, S. Counts, and M. Czerwinski. Identifying relevant social media 
content: Leveraging information diversity and user cognition. In Proceedings of the 
22nd ACM HT, pages 161-170, 2011. 

13. M. Drosou and E. Pitoura. Diversity over continuous data. IEEE Data Eng. Bull, 
32(4):49-56, 2009. 

14. M. Drosou and E. Pitoura. Dynamic diversification of continuous data. In Pro¬ 
ceedings of the 15th EDBT, EDBT ’12, pages 216-227, 2012. 

15. Y. Duan, L. Jiang, T. Qin, M. Zhou, and H.-Y. Shum. An empirical study on 
learning to rank of tweets. In Proceedings of the 23rd COLING, 2010. 

16. S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In 
Proceedings of the 18th WWW, pages 381-390, 2009. 

17. T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd 
ACM SIGIR, SIGIR ’99, pages 50-57, 1999. 

18. J. Lin and M. Efron. Overview of the tree-2013 microblog track. In In Proceedings 
of TREC 2013, 2013. 

19. T.-Y. Liu. Learning to Rank for Information Retrieval. Springer, 2011. 

20. E. Minack, W. Siberski, and W. Nejdl. Incremental diversification for very large 
sets: A streaming-based approach. In Proceedings of the 3fth SIGIR, 2011. 

21. I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the tree-2011 mi¬ 
croblog track. In In Proceedings of TREC 2011, 2011. 

22. R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web 
search result diversification. In Proceedings of the 19th WWW, 2010. 

23. I. Soboroff, I. Ounis, C. Macdonald, and J. Lin. Overview of the tree-2012 mi¬ 
croblog track. In In Proceedings of TREC 2012, 2012. 

24. F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning 
to rank: theory and algorithm. In Proceedings of the 25th ICML, 2008. 

25. Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In 
Proceedings of the 25th ICML, pages 1224-1231, 2008. 

26. Y. Zhu, Y. Lan, J. Guo, X. Cheng, and N. Shuzi. Learning for search result 
diversification. In Proceedings of the 37th ACM SIGIR, 2014. 



